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SUMMARY 
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The  major  thrust  of  our  effort  was  focused  on  the  theory  and  practice  of  responsive 
(fault-tolerant,  real-time)  computing  in  parallel  and  distributed  processing  environments. 

New  efficient  methods  of  system  testing  have  been  developed  which  shorten  a 
multiprocessor  testing  time  by  orders  of  magnitude  and,  therefore,  can  be  used  at  system 
booting  (previous  techniques  were  prohibitively  long). 

A  new  design  bramework  for  responsive  computing  was  designed  and  is  being 
implemented  for  validation.  This  framework  is  based  on  consensus  which  can  be  used  to 
provide  synchronization,  reliable  communication,  fault  diagnosis,  checkpointing  and  even 
scheduling  in  multiprocessor  environments. 

We  have  formalized  and  quantified  the  space-time  tradeoff  for  efficient  fault  recovery. 
The  system  model  is  a  graph,  and  we  were  especially  successful  in  analysis  of  meshes  and 
hypercubes. 

We  developed  a  new  method  called  naturally  redundant  algorithms  which  allows 
efficient  implementation  of  application-specific  techniques. 

We  also  develt^ied  and  tested  a  comprehensive  formal  model  for  fault-tolerant  parallel 
algorithm  design. 

We  have  made  significant  contributions  to  the  theory  and  practice  of  parallel  computer 
network  design,  analysis  and  fault  tolerance. 

We  have  also  proposed  a  novel  approach  to  searching  by  combining  existing  search 
techniques  in  order  to  maximize  performance  by  finding  better  solutions  in  a  shorter 
amount  of  time. 

This  report  consists  of  a  more  detailed  description  of  each  accomplishment,  followed 
by  a  relevant  bibliography. 


A.  Topological  Testing,  Array  Testing  and  Diagnosis. 

We  have  introduced  a  new  concept,  topological  testing,  and  demonstrated  several 
applications  in  the  area  of  multiprocessor  testing.  Topological  testing  uses  graph  theoretic 
optimization  methods  such  as  the  Traveling  Salesman  Problem,  the  Chinese  Postman 
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ptdi  coveilBf  and  intrtitioning  to  minimize  the  test  time.  The 
loi^ldogical  te^^  lechniqim  can  be  anotied  to  test  a  system's  behavior  and  its  mganizadon 
at  each  level  of  tte  system’s  hierarchy;  namely,  circuit,  logic,  register  transfer,  instnution 
and  processor-memory-switch  levels.  Speciflcally,  the  topological  testing  approach  is 
demonstrated  by  developing  tests  for  the  multistage  interconnection  network  and  the 
hypercube  network.  Time  optimization  for  the  testing  of  these  networks  gives  very 
promising  results  by  taking  advantage  of  inherent  parallelism  and  removing  test 
redundancy.  Three  orders  of  magnitude  improvement  is  achieved  by  applying  topological 
testing  techniques  to  the  testing  of  an  existing  multistage  interconnection  netwOTk. 

We  have  identified  a  new  solution  to  contention  testing.  Using  concepts  of  square  of 
a  graph  and  coloring,  we  have  devised  optimal  algorithms  for  contention  testing  in  paths 
(busses),  cycles  (rings),  trees,  meshes  and  hypeicubes.  The  universality  and  power  of  the 
prqjosed  approach  has  been  proved  to  be  useful,  not  only  to  system  testing,  but  also  in 
system  integration  procedures. 

We  developed  efficient  methods  for  testing  packet- switched  multistage  intercon¬ 
nection  networks.  In  addition  to  testing  the  data  paths  and  routing  capabilities,  we  provide 
tests  for  detecting  faults  in  the  control  circuitry,  including  the  conflict  resolution 
capabilities.  Using  a  general  model  of  the  switch,  we  constructed  testing  sequences  for  the 
internal  functions  of  the  /  x  /  switch  requiring  only  0(f^2f)  tests  in  the  case  of  round- 
robin  priority  and  0(f2f)  in  the  case  of  fixed  priority  (/  is  usually  a  constant  that  is  less 
than  or  equal  to  8).  We  also  developed  algorithms  to  test  the  entire  network  using,  at 
nwst,  twice  the  number  of  tests  needed  to  test  a  switch,  independently  of  the  network  size, 
which  results  in  6>(log  N)  testing  time  for  an  ^-processor  network.  We  demonstrated  that 
our  method  achieves  higher  coverage  and  several  orders  of  magnitude  reduction  in  the 
testing  time  of  complex  multiprocessor  systems  when  compared  to  the  previous  methods. 

We  also  developed  an  overlapped  segmentation  method  for  testing  cellular  arrays. 
The  method  optimizes  the  number  of  tests  for  pattern  sensitivity  faults  in  combinational 
logic. 
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B.  Responsive  Computer  Systems  Design  Framework. 

The  simple  idea  of  consensus  is  to  share  information  among  a  group  of  processing 
elements,  preferably  in  a  fault-tolerant  manner  such  that  the  fault-hree  members  of  the  PE 
population  are  able  to  consistently  agree  on  and  produce  correct  results  despite  the  actions, 
malicious  or  not,  of  the  faulty  segment  of  the  population.  The  importance  of  the  problem 
stems  from  its  omnipresence.  This  problem  is  at  the  core  of  protocols  handling 
synchronization,  reliable  communication,  resource  allocation,  task  scheduling, 
reconfiguration,  replicated  file  systems,  sensor  reading  and  other  functions. 

We  have  just  completed  an  extensive  survey  of  system-level  diagnosis  and  Byzantine 
protocols  and  aim  at  design  and  implementation  of  responsive  (fault-tolerant,  real-time) 
consensus  for  system  diagnosis.  Our  approach  is  designed  to  handle  large  heterogeneous 
distributed-system  environments  and  is  based  on  system  partitioning  and  protocol 
hierarchy. 

The  main  result  is  the  real  time  and  fault  tolerance  analysis  of  diagnosis  protocols 
used  towards  implementing  a  responsive  system.  We  examine  the  use  of  redundancy 
management  in  both  time  (reassignment  of  tasks)  and  space  (masking  faults  by  voting  on 
the  result),  and  the  tradeoffs  involved.  Finally,  given  a  real-time  system  and  a  schedule  of 
tasks,  we  can  determine  what  changes  need  to  be  made  in  order  for  the  system  to  be 
responsive. 

Since  fault  tolerance  is  created  by  the  management  of  time  and  space  redundancy  in 
software,  these  techniques  are  applicable  to  real-time  distributed  systems.  Also,  because 
this  is  a  software  technique,  the  tradeoff  betw^n  time  criticality  and  throughput  is  readily 
managed  in  various  partitions  of  the  system. 


of  j^mools  ¥4iidh  wappon  v«kMit 
^  p  syiiihiMiifttioii*  fanlt  dii^nosis  and  load  sharing  for  specific  fault 
models.  We  conqjdmed  the  design  of  consensus  protocol  uiukr  a  crash  fault  noodel.  We 
plan  to  incorporate  other  models  as  well. 
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C.  Space-Time  Tradeoffs  for  Efficient  Fault  Recovery  and  Fault  Tolerance. 
We  have  formalized  and  quantified  the  space-time  tradeoff  ftw  efficient  fault  recovery. 

The  mathematical  model  which  proves  to  be  most  appropriate  is  provided  by  the  theory  of 
graphs.  We  model  a  computer  architecture  as  a  graph  G  and  a  job  mapped  into  G  as  a 
subgraph  H.  Ilte  dependability  qualities  of  such  a  system  with  or  without  a  fault  are 
determined  by  the  resiliency  triple  defined  by  three  parameters:  multiplicity,  robustness  and 
configurability.  Multiplicity  indicates  the  number  of  jobs  represented  by  H  that  can  be 
mapped  (executed  simultaneously)  on  a  graph  (system)  G.  Robustness  represents  the 
number  of  jobs  that  can  be  mapped  such  that  each  of  their  tasks  is  executed  on  a  different 
processor.  The  first  parameter  allows  to  measure  redundancy  in  space,  while  the  second 
one  corresponds  to  redundancy  in  time.  Configurability  counts  the  number  of  ways  a 
particular  job  can  be  mapped  onto  a  system  G.  We  have  developed  algorithms  for 
evaluation  of  the  above  parameters  in  mesh  and  hypercube  networks.  We  have  also 
proposed  algorithms  which  optimized  the  reconfiguration  of  a  faulty  job  with  minimum 
space/time  overhead.  Our  approach  explores  the  inherent  fault  tolerance  of  multiprocessor 
systems  and  exploits  the  topological  relationship  between  the  systems  architecture  and  the 
target  applications.  The  polynomial  solutions  are  provided  for  paths  and  trees  mapped  onto 
mesh  and  hypercube  networks.  In  general,  the  problem  is  equivalent  a  subgraph 
istxnorphism  and  cannot  be  solved  efficiently. 
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D.  Naturally  Redundant  Algorithms. 

We  have  characterized  a  class  of  algorithms  suitable  for  fault-tolerant  execution  in 
multiprocessor  systems  by  exploiting  the  existing  embedded  redundancy  in  the  problem 
variables.  Because  of  this  unique  property,  no  extra  computations  need  be  superimposed 
to  the  algCHithm  in  order  to  provide  redundancy  for  fault  recovery,  as  well  as  fault  detection 
in  some  cases.  A  forward  recovery  scheme  is  thus  employed  with  very  low  time  overhead. 
Tire  method  is  applied  to  the  implementation  of  two  iterative  algorithms:  solution  of 
Laplace  equations  by  Jacobi's  method  and  the  calculation  of  the  invariant  distribution  of  a 
Markov  chain.  Experiments  show  less  than  15%  performance  degradation  for  significant 
problem  instances  in  fault-free  situations,  and  as  low  as  2.43%  in  some  cases.  The  extra 
computation  time  needed  for  locating  and  recovering  from  a  detected  fault  does  not  exceed 
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E.  Comprehensive  Formal  Model  for  Fault-Tolerant  Parallel  Algorithms. 

We  develcfed  a  comprehensive  formal  model  for  fault-tolerant  parallel  algorithms  and 
a  general  methodology  for  designing  reliable  applications  for  multiprocessor  environments. 
The  model  relies  on  the  formalization  of  fault-tolerant  concepts  by  means  of  three  nested 
system  predicates  and  on  propenies  ruling  their  interrelationships.  This  rigorous  frame- 
woik  facilitates  the  study  of  the  specific  properties  that  enable  an  algorithr.'  to  tolerate 
faults.  The  consequence  of  that  is  the  outline  of  systematic  design  techniques  that  can  be 
utilized  to  add  fault-tolerant  properties  to  algorithms  while  preserving  their  functional 
characteristics.  The  proposed  model  also  allows  for  the  quantification  of  the  costs  of  fault 
tolerance  in  terms  of  space  and  time  redundancy,  clarifying  the  tradeoffs  which  are  inherent 
to  the  fault-tolerant  design  process.  The  model  and  design  methodology  are  validated  by 
the  uniform  application  of  their  principles  in  the  study  of  several  well-known  fault-tolerant 
techniques.  The  analysis  of  the  cost  of  fault  tolerance  in  each  of  these  techniques  points  out 
that  the  exploitation  of  natural  redundancy,  in  applications  where  this  property  is  present, 
will  lead  to  the  design  of  fault-tolerant  parallel  algorithms  with  very  attractive  cost/benefit 
ratio. 
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F.  Parallel  Computer  Network  Design,  Analysis  and  Fault  Tolerance. 

We  developed  a  comprehensive  technique  iu  evaluate  the  quality  of  multistage 
interconnection  networks  with  respect  to  their  combinatorial  power.  We  analyzed  both 
fault-intolerant  and  fault-tolerant  networks  and  evaluated  them  with  respect  to  graceful 
degradatKMi  capability. 

We  also  proposed  a  new  cylindrical  banyan  multicomputer  architecture  that  still  has 
the  best-to-date  cost  x  delay  product. 
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G .  Search  Techniques. 

We  continue  our  experiments  with  a  hybrid  algorithm  where  multiple  algorithms 
execute  the  same  problem  and  exchange  information.  This  promising  approach, 
implemented  on  shared-memory  parallel  processors,  gave  us  superlinear  speedup.  We 
have  achieved  an  order-of-magni^ude  speedup  on  a  two-processor  system.  The  only  way 
we  can  explain  it  is  that,  in  fact,  we  have  created  a  new  algorithm.  We  are  currently 
implementing  the  same  approach  on  a  multiprocessor  developed  at  Microelectronics  and 
CtMnputer  Technology  Corporation  in  Austin,  Texas. 

We  designed  a  special-purpose  processor  for  efficient  processing  of  a  tabu  search  for 
a  traveling  salesman  problem.  Our  speedup  analysis  indicates  almost  three  orders  of 
magnitude  improvement  over  state-of-the-art  general-purpose  processors. 
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TRANSITIONS  AND  DoD  INTERACTIONS 

A  robust  implementation  of  the  two-dimensional  Fast-Fourier  Transform  algorithm 
was  mapped  onto  IBM's  GF-1 1  parallel  computer,  in  addition  to  topological  testing,  at 
IBM's  T.  J.  Watson  Research  Center  in  New  Yoric. 

The  Hybrid  Algorithm  Technique  was  implemented  on  Motcntila's  parallel  computer, 
Pleiades,  at  Microelectronics  and  Computer  Technology  Corporation  in  Austin,  Texas. 
The  new  technique  o^ers  super-linear  speedup  for  search  methods,  such  as  tabu  and 
simulated  annealing. 

IBM's  GF-1 1  supercomputer  controls  the  interconnection  network  centrally.  This 
control  scheme  makes  the  run-time  network  configuration  very  expensive.  Graph 
theoretical  approaches  were  investigated  by  Banu  Ozden  during  the  summer  of  1990  at  the 
T.  J.  Watson  Research  Center  in  Yorktown  Heights,  New  York,  to  emulate  distributed 
control  on  the  Benes  network  of  GF-11  parallel  computer  which  reduced  the  cost  of 
dynamic  interprocessor  communication. 

The  Hybrid  Algorithm  Technique  was  successfully  implemented  for  the  Traveling 
Salesiiuin  Problem  on  a  network  of  UNIX  machines  by  Mihir  A.  Pandya  during  the 
summer  of  1990  at  IBM,  Austin,  Texas.  The  distributed  processing  implementation  used 
Remote  Procedure  Calls  and  incorporated  real-time  and  fault-tolerant  features  so  that  the 
execution  completed  in  the  designated  time,  and  results  were  available  as  long  as  a  single 
processor  was  available.  Near-linear  speedup  was  obtained  on  2-4  processors. 

The  topological  testing  approach  was  used  on  the  Benes  network  of  the  IBM's 
Gigaflop-1 1  (GF-11)  multiprocessor.  My  student  spent  the  summer  of  1989  at  the  T.  J. 
Watstm  Research  Center  in  Yorktown  Heights,  New  York,  and  applied  a  combination  of 
level-sensitive  scan  design  (LSSD)  and  topological  testing  to  the  GF-1 1  supercomputer 
which  resulted  in  significantly  higher  fault  covoage  and  improvement  in  test  time. 
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